Set-Class Similarity, Voice Leading, and the Fourier Transform

نویسنده

  • Dmitri Tymoczko
چکیده

In this article, I consider two ways to model distance (or inverse similarity) between chord types, one based on voice leading and the other on shared interval content. My goal is to provide a contrapuntal reinterpretation of Ian Quinn’s work, which uses the Fourier transform to quantify similarity of interval content. The first section of the article shows how to find the minimal voice leading between chord types or set-classes. The second uses voice leading to approximate the results of Quinn’s Fourier-based method. The third section explains how this is possible, while the fourth argues that voice leading is somewhat more flexible than the Fourier transform. I conclude with a few thoughts about realism and relativism in music theory. twentieth-century music often moves flexibly between contrasting harmonic regions: in the music of Stravinsky, Messiaen, Shostakovich, Ligeti, Crumb, and John Adams, we find diatonic passages alternating with moments of intense chromaticism, sometimes mediated by nondiatonic scales such as the whole-tone and octatonic. In some cases, the music moves continuously from one world to another, making it hard to identify precise bound aries between them. Yet we may still have the sense that a particular passage, melody, or scale is, for instance, fairly diatonic, more-or-less octatonic, or less diatonic than whole-tone. A challenge for music theory is to formalize these intuitions by proposing quantitative methods for locating musical objects along the spectrum of contemporary harmonic possibilities. One approach to this problem uses voice leading: from this point of view, to say that two set-classes are similar is to say that any set of the first type can be transformed into one of the second without moving its notes very far. Thus, the acoustic scale is similar to the diatonic because we can transform one into the other by a single-semitone shift; for example, the acoustic scale {C, D, E, F≥, G, A, B≤} can be made diatonic by the single-semitone displacement F≥ → F or B≤ → B. Similarly, when we judge the minor seventh chord Thanks to Rachel Hall, Justin Hoffman, Ian Quinn, Joe Straus, and in particular Clifton Callender, whose investigations into continuous Fourier transforms deeply influenced my thinking. Callender pursued his approach despite strenuous objections on my part, for which I am both appropriately grateful and duly chastened. 252 J O U r n A L o f M U S I C T h E O r Y Dmitri Tymoczko Voice Leading and the Fourier Transform to be very similar to the dominant seventh, we are saying that we can relate them by a single-semitone shift. This conception of similarity dates back to John roeder’s work in the mid-1980s (1984, 1987) and has been developed more recently by Thomas robinson (2006), Joe Straus (2007), and Clifton Callender, Ian Quinn, and myself (2008). The approach is consistent with the thought that composers, sitting at a piano keyboard, would judge chords to be similar when they can be linked by small physical motions. Another approach uses intervallic content: from this point of view, to say that set-classes are similar is to say that they contain similar collections of intervals. (That the two methods are different is shown by “Z-related” or “nontrivially homometric” sets, which contain the same intervals but are nonidentical according to voice leading.) In a fascinating pair of papers, Quinn has demonstrated that the Fourier transform can be used to quantify this approach.1 Essentially, for any number n from 1 to 6, and every pitch class p in a chord, the Fourier transform assigns a two-dimensional vector whose components are Vp,n 5 (cos 2ppn/12, sin 2ppn/12). (1) Adding these vectors together, for one particular n and all the pitch classes p in the chord, produces a composite vector representing the chord as a whole— its “nth Fourier component.” The length (or “magnitude”) of this vector, Quinn astutely observes, reveals something about the chord’s harmonic character: in particular, chords saturated with (12/n)-semitone intervals, or intervals approximately equal to 12/n, tend to score highly on this index of chord quality.2 The Fourier transform thus seems to capture the intuitive sense that chords can be more or less diminished-seventh-like, perfect-fifthy, or wholetonish. It also seems to offer a distinctive approach to set-class similarity: from this point of view, two set-classes can be considered “similar” when their Fourier magnitudes are approximately equal—a situation that obtains when the chords have approximately the same intervals. The interesting question is how these two conceptions relate. In recent years, a number of theorists have tried to reinterpret Quinn’s Fourier magnitudes using voice-leading distances. robinson (2006), for example, pointed out that there is a strong anticorrelation between the magnitude of a chord’s first Fourier component and the size of the minimal voice leading to the nearest chromatic cluster. (See also Straus 2007, which echoes robinson’s point.) however, neither robinson nor Straus found an analogous interpretation of the other Fourier components. In an interesting article in this issue (see pages 219–49), Justin hoffman extends this work, interpreting Fourier components in light of unusual “voice-leading lattices” in which voices move by distances other than one semitone. But despite this intriguing idea, the 1 See Quinn 2006 and 2007. Quinn’s use of the Fourier transform develops ideas in Lewin 1959 and 2001 and Vuza 1993. 2 These magnitudes are the same for transpositionally or inversionally related chords, so it is reasonable to speak of a set-class’s Fourier magnitudes. Dmitri Tymoczko Voice Leading and the Fourier Transform 253 relation between Fourier analysis and more traditional conceptions of voice leading remains obscure. The purpose of this article is to describe a general connection between the two approaches: it turns out that the magnitude of a chord’s nth Fourier component is approximately inversely related to the size of the minimal voice leading to the nearest subset of any perfectly even n-note chord.3 For instance, a chord’s first Fourier component is approximately inversely related to the size of the minimal voice leading to any transposition of {0}; the second Fourier component is approximately inversely related to the size of the minimal voice leading to any transposition of either {0} or {0, 6}; the third component is approximately inversely related to the size of the minimal voice leading to any transposition of either {0}, {0, 4}, or {0, 4, 8}, and so on. Interestingly, however, we can see this connection clearly only when we model chords as multisets in continuous pitch-class space, following the approach of Callender, Quinn, and Tymoczko (2008). (This in fact may be one reason why previous theorists did not notice the relationship.) When we do adopt this perspective, we see that there is a deep relationship between two seemingly very different conceptions of set-class similarity, one grounded in voice leading, the other in interval content. Furthermore, this realization allows us to generalize some of the features of Quinn’s approach, using related methods that transcend some of the limitations of the Fourier transform proper. I. Voice leading and set-class similarity Let me begin by describing the voice-leading approach to set-class similarity (or inverse distance), reviewing along the way some basic definitions. Much of what follows is drawn from (or implicit in) earlier essays, including Tymoczko 2006 and 2008 and Callender, Quinn, and Tymoczko 2008; readers who want to explore these ideas further are hereby referred to these more in-depth discussions. We can label pitch classes using real numbers (not just integers) in the range [0, 12), with C as 0.4 here the octave has size 12, and familiar twelvetone equal-tempered semitones have size 1. This system provides labels for every conceivable pitch class and does not limit us to any particular scale; thus, the number 4.5 refers to “E quarter-tone sharp,” halfway between the twelvetone equal-tempered pitch classes E and F. A voice leading between pitch-class sets corresponds to a phrase like “the C major triad moves to E major by moving C down to B, holding E fixed, and shifting G up by semitone to G≥.” We can notate this more efficiently by writing 3 By “perfectly even n-note chord” I mean the chord that exactly divides the octave into n equally sized pieces, not necessarily lying in any familiar scale. For example, the perfectly even eight-note chord is {0, 1.5, 3, 4.5, 6, 7.5, 9, 10.5}. 4 The notation [x, y) indicates a range that includes the lower bound x but not the upper bound y. Similarly (x, y) includes neither upper nor lower bounds, while [x, y] includes both. 254 J O U r n A L o f M U S I C T h E O r Y Dmitri Tymoczko Voice Leading and the Fourier Transform (C, E, G) 1, 0, 1 (B, E, G≥), indicating that C moves to B by one descending semitone, E moves to E by zero semitones, and G moves to G≥ by one ascending semitone. The order in which voices are listed is not important; thus, (C, E, G) 1, 0, 1 (B, E, G≥) is the same as (E, G, C) 0, 1, 1 (E, G≥, B). The numbers above the arrows represent paths in pitch-class space, or directed distances such as “up two semitones,” “down seven semitones,” “up thirteen semitones,” and so on. When the paths all lie in the range (–6, 6] I eliminate them; thus, a notation like (C, E, G) → (B, E, G≥) indicates that each voice moves to its destination along the shortest possible route, with the arbitrary convention being that tritones ascend. Formally, voice leadings between pitch-class sets can be modeled as multisets of ordered pairs, in which the first element is a pitch class and the second a real number representing a path in pitch-class space. Voice leadings are bijective when they associate each element of one chord with precisely one element of the other. however, it matters whether we represent chords as sets (containing no duplications) or multisets (which may contain multiple copies of pitch classes). For example, the voice leading (C, C, E, G) → (A, C, F, F) is simultaneously a nonbijective voice leading between the sets {C, E, G} and {F, A, C} and also a bijective voice leading between the multisets {C, C, E, G} and {F, F, A, C}. For the purposes of this article, it is convenient to represent chords as multisets and to consider only bijective voice leadings between them. however, in other contexts, it can be useful to consider sets and nonbijective voice leadings.5 It turns out to be a nontrivial task to devise an algorithm for measuring set-class similarity when nonbijective voice leadings are permitted. Fortunately, this complication is irrelevant here. We measure the size of a voice leading using some function of (or partial order on) the nondirected distances moved by the individual voices. (These are the absolute values of the numbers above the arrows in the voice leading.) In principle, there are many different measures of voice-leading size but no compelling reason to choose one over another (Tymoczko 2006; hall and Tymoczko 2007). In this article, however, it is convenient to use the Euclidean metric, according to which the size of a collection of real numbers x1, x2, . . . , xn is x1 2  x2 2  . . .  xn 2. The reasons for this choice are that the Euclidean metric (1) provides a reasonable approximation to a range of voice-leading measures (hall and Tymoczko 2007), (2) is computationally tractable, and (3) is particularly well suited to the task of investigating the Fourier transform. The latter two points are clarified shortly. We can define the distance between two set-classes as the size of the minimal voice leading between any of their transpositions or inversions. The term any 5 For example, one might consider the distance between C and E major seventh chords to be determined by the nonbijective voice leading (C, E, E, G, B) } (B, D≥, E, G≥, B), which is in fact smaller than the smallest four-voice voice leading between them. See Callender, Quinn, and Tymoczko 2008, supplementary section 7. Dmitri Tymoczko Voice Leading and the Fourier Transform 255 here means “any of their forms in continuous pitch-class space”; thus, when measuring distances between set-classes we cannot necessarily confine ourselves within any particular scale. For example, according to the Euclidean metric, the distance between the perfect fourth and major third is given not by the voice leading (C, F) → (D≤, F), with size 1, but by (C, F) → (C , E ) (or C “quarter-tone sharp,” E “quarter-tone sharp”) with size   1  0.707. 2 1 2 2   1 2 2    Though this may initially seem counterintuitive, it has on reflection a certain logic: if we are really interested in intrinsic relations between set-classes, then there is no reason to think that we can limit our attention to those that happen to appear in any one scale. now the Euclidean metric is particularly convenient for the following reason: if we are looking for the minimal voice leading between any two transpositions of any two chords, we need only consider those whose pitch classes sum to the same value modulo 12. (This in turn follows from basic facts of Cartesian geometry.)6 For example, suppose we are trying to find the minimal Euclidean voice leading from the C augmented triad to any diminished triad. The pitch classes {C, E, G≥} are represented by the numbers {0, 4, 8}, which sum to 0 1 4 1 8 5 12 [ 0 (mod 12). To find the nearest diminished triad, we need only consider those whose pitch classes sum to 0 (mod 12): {0, 3, 9}, {1, 4, 7}, and {5, 8, 11}. Observe that there are three, all related by major-third transposition. In general, we can always transpose an n-note chord by 12/n semitones without changing its sum (mod 12), and we can repeat this procedure n times before the initial chord reappears; thus, there will in general be n different transpositions of each n-note chord summing to the same number.7 note also that when searching for minimal voice leadings, we will frequently need to consider fractional pitch classes; for example, to find the nearest minor triad to {0, 4, 7}, we need to look at those summing to 11. These are 1 {0 , 3 , 7 }, 3 1 3 1 3 1 {4 , 7 , 11 }, 3 1 3 1 3 1 {3 , 8 , 11 } 3 1 3 1 3 (or, in other words, the familiar C minor, E minor, and A≤ minor triads, transposed up by one-third of a semitone). These chords, of course, do not reside on the ordinary piano keyboard. Finally, suppose that we have two chords (x1, x2, . . . , xn) and (y1, y2, . . . , yn) with each chord’s pitch classes listed in ascending numerical order (when 6 An ordered set can be modeled as a point in Rn. Transposition corresponds to motion along the “unit diagonal” that contains both the origin and (1, 1, . . . , 1). Transpositional set-classes can thus be represented by lines parallel to the unit diagonal. The shortest vector between any two of these lines will (according to the Euclidean metric) be perpendicular to both. This means that the vector’s dot product with (1, 1, . . . , 1) will be equal to zero, which in turn implies that the sum of its components is zero. Hence, the coordinates of its endpoints sum to the same value. 7 The qualification “in general” is needed because of symmetrical chords: when we transpose {0, 4, 8} by four semitones, we get the same chord again. 256 J O U r n A L o f M U S I C T h E O r Y Dmitri Tymoczko Voice Leading and the Fourier Transform considered as real numbers). To find the minimal voice leading between them, we need to consider n different circular permutations (x1, x2, . . . , xn) → (y1, y2, . . . , yn), (x1, x2, . . . , xn) → (y2, y3, . . . , yn, y1),

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Singing Voice Separation from Monaural Music Based on Kernel Back-Fitting Using Beta-Order Spectral Amplitude Estimation

Separating the leading singing voice from the musical background from a monaural recording is a challenging task that appears naturally in several music processing applications. Recently, kernel additive modeling with generalized spatial Wiener filtering (GW) was presented for music/voice separation. In this paper, an adaptive auditory filtering based on β-order minimum mean-square error spectr...

متن کامل

Classifying rangeland vegetation type and coverage using a Fourier component based similarity measure

This paper defines a land cover classification technique based on the annual NDVI cycle. A similarity measure based directly on the components of the Discrete Fourier Transform is used to determine a pixels class membership. This Fourier component similarity measure produces an objective, computationally inexpensive and rapid method of classification that is able to classify rangeland vegetatio...

متن کامل

Detection of Fake Accounts in Social Networks Based on One Class Classification

Detection of fake accounts on social networks is a challenging process. The previous methods in identification of fake accounts have not considered the strength of the users’ communications, hence reducing their efficiency. In this work, we are going to present a detection method based on the users’ similarities considering the network communications of the users. In the first step, similarity ...

متن کامل

Adaptive Spectral Estimation of Locally Stationary Signalsst

We introduce a class of non-stationary processes, or signals, that are close to stationary and call them locally stationary. They arise in many applications in seismology, in speech analysis and elsewhere. We show that their local spectral characteristics can be obtained eeciently using an adaptive windowed Fourier transform. We illustrate with some examples from seismology the use of our metho...

متن کامل

Statistical model training technique for speech synthesis based on speaker class

To allow the average-voice-based speech synthesis technique to generate synthetic speech that is more similar to that of the target speaker, we propose a model training technique that introduces the label of speaker class. Speaker class represents the voice characteristics of speakers. In the proposed technique, first, all training data are clustered to determine classes of speaker type. The av...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010